41 research outputs found
Analyzing the Performance Portability of Tensor Decomposition
We employ pressure point analysis and roofline modeling to identify
performance bottlenecks and determine an upper bound on the performance of the
Canonical Polyadic Alternating Poisson Regression Multiplicative Update (CP-APR
MU) algorithm in the SparTen software library. Our analyses reveal that a
particular matrix computation, , is the critical performance
bottleneck in the SparTen CP-APR MU implementation. Moreover, we find that
atomic operations are not a critical bottleneck while higher cache reuse can
provide a non-trivial performance improvement. We also utilize grid search on
the Kokkos library parallel policy parameters to achieve 2.25x average speedup
over the SparTen default for computation on CPU and 1.70x on GPU.
We conclude our investigations by comparing Kokkos implementations of the
STREAM benchmark and the matricized tensor times Khatri-Rao product (MTTKRP)
benchmark from the Parallel Sparse Tensor Algorithm (PASTA) benchmark suite to
implementations using vendor libraries. We show that with a single
implementation Kokkos achieves performance comparable to hand-tuned code for
fundamental operations that make up tensor decomposition kernels on a wide
range of CPU and GPU systems. Overall, we conclude that Kokkos demonstrates
good performance portability for simple data-intensive operations but requires
tuning for algorithms with more complex dependencies and data access patterns.Comment: 28 pages, 19 figure
Scalable Tensor Factorizations for Incomplete Data
The problem of incomplete data - i.e., data with missing or unknown values -
in multi-way arrays is ubiquitous in biomedical signal processing, network
traffic analysis, bibliometrics, social network analysis, chemometrics,
computer vision, communication networks, etc. We consider the problem of how to
factorize data sets with missing values with the goal of capturing the
underlying latent structure of the data and possibly reconstructing missing
values (i.e., tensor completion). We focus on one of the most well-known tensor
factorizations that captures multi-linear structure, CANDECOMP/PARAFAC (CP). In
the presence of missing data, CP can be formulated as a weighted least squares
problem that models only the known entries. We develop an algorithm called
CP-WOPT (CP Weighted OPTimization) that uses a first-order optimization
approach to solve the weighted least squares problem. Based on extensive
numerical experiments, our algorithm is shown to successfully factorize tensors
with noise and up to 99% missing data. A unique aspect of our approach is that
it scales to sparse large-scale data, e.g., 1000 x 1000 x 1000 with five
million known entries (0.5% dense). We further demonstrate the usefulness of
CP-WOPT on two real-world applications: a novel EEG (electroencephalogram)
application where missing data is frequently encountered due to disconnections
of electrodes and the problem of modeling computer network traffic where data
may be absent due to the expense of the data collection process
Recommended from our members
QCS: a system for querying, clustering and summarizing documents.
Information retrieval systems consist of many complicated components. Research and development of such systems is often hampered by the difficulty in evaluating how each particular component would behave across multiple systems. We present a novel hybrid information retrieval system--the Query, Cluster, Summarize (QCS) system--which is portable, modular, and permits experimentation with different instantiations of each of the constituent text analysis components. Most importantly, the combination of the three types of components in the QCS design improves retrievals by providing users more focused information organized by topic. We demonstrate the improved performance by a series of experiments using standard test sets from the Document Understanding Conferences (DUC) along with the best known automatic metric for summarization system evaluation, ROUGE. Although the DUC data and evaluations were originally designed to test multidocument summarization, we developed a framework to extend it to the task of evaluation for each of the three components: query, clustering, and summarization. Under this framework, we then demonstrate that the QCS system (end-to-end) achieves performance as good as or better than the best summarization engines. Given a query, QCS retrieves relevant documents, separates the retrieved documents into topic clusters, and creates a single summary for each cluster. In the current implementation, Latent Semantic Indexing is used for retrieval, generalized spherical k-means is used for the document clustering, and a method coupling sentence 'trimming', and a hidden Markov model, followed by a pivoted QR decomposition, is used to create a single extract summary for each cluster. The user interface is designed to provide access to detailed information in a compact and useful format. Our system demonstrates the feasibility of assembling an effective IR system from existing software libraries, the usefulness of the modularity of the design, and the value of this particular combination of modules
Homotopy Optimization Methods and Protein Structure Prediction
A central challenge in biochemistry today is the problem of predicting the tertiary (three-dimensional) structure of a protein in its native state given its amino acid sequence. According to Anfinsen’
Homotopy Optimization methods for Global Optimization
We define a new method for global optimization, the Homotopy Optimization Method (HOM). This method differs from previous homotopy and continuation methods in that its aim is to find a minimizer for each of a set of values of the homotopy parameter, rather than to follow a path of minimizers. We define a second method, called HOPE, by allowing HOM to follow an ensemble of points obtained by perturbation of previous ones. We relate this new method to standard methods such as simulated annealing and show under what circumstances it is superior. We present results of extensive numerical experiments demonstrating performance of HOM and HOPE